Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks

نویسندگان

  • G. de Croon
  • M. F. van Dartel
چکیده

Artificial agents are often trained to perform non-Markovian tasks, i.e., tasks in which the sensory inputs can be ambiguous. Agents typically learn how to perform such tasks using either reinforcement learning (RL) or evolutionary learning (EL). In this paper, we empirically demonstrate that these learning methods result in different levels of performance when applied to a non-Markovian task: the Active Categorical Perception (ACP) task. In the ACP-task, the proportion of ambiguous sensor states can be varied. EL outperforms RL for all tested proportions of ambiguous states. In addition, we show that the relative performance difference between RL and EL increases with the proportion of ambiguous sensor states. We argue that the cause of this increasing performance difference is that in RL the learned policy consists of those state-action pairs that individually have the highest estimated values, while the performance of a policy for a non-Markovian task highly depends on the combination of state-action pairs selected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning

To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...

متن کامل

Reinforcement Learning with LSTM in Non-Markovian Tasks with Long-Term Dependencies

This paper presents reinforcement learning with a Long Short-Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage( ) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a di cult variation of the pole balancing task.

متن کامل

Human learning in non-Markovian decision making

Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...

متن کامل

A Cultural Algorithm for POMDPs from Stochastic Inventory Control

Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algor...

متن کامل

First Step Towards Continual LearningMARK

Continual learning is the constant development of increasingly complex behaviors; the process of building more complicatedskills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005